Skip to content

fix(notifications): push notification reliability and reaction context#699

Draft
Just-Insane wants to merge 268 commits intoSableClient:devfrom
Just-Insane:fix/push-notifications
Draft

fix(notifications): push notification reliability and reaction context#699
Just-Insane wants to merge 268 commits intoSableClient:devfrom
Just-Insane:fix/push-notifications

Conversation

@Just-Insane
Copy link
Copy Markdown
Contributor

Summary

Consolidates three push-notification reliability branches. Fixes silent notification drops on mobile (iOS/Android), improves service worker session handling, and ensures reaction notifications include correct room and user context.


Changes

fix/clear-cache-sw-unregister

  • When the user triggers "Clear Cache" in settings, unregister all service workers before clearing. Prevents the SW from serving stale cached assets after a cache wipe.
  • Replaces the conditional if (self.__WB_MANIFEST) guard with precacheAndRoute(self.__WB_MANIFEST ?? []) for Workbox compatibility.

fix/reaction-notification-context

  • Pass room and userId to the reaction notification filter so it can correctly determine whether a reaction notification belongs to the current user.
  • Open joined rooms at the live timeline (bottom) when tapped from a notification, rather than at the last-read marker.
  • Defer the event-scoped jump until the target event is present in the live timeline, preventing a jump to index 0 on rooms with late-loading history.
  • Fix isEncryptedRoom flag in pushNotification.ts (was hardcoded false).

fix/sw-push-session-recovery

  • Session preloading: warm a cached Matrix session in the SW on install/activate so push handlers don't have to wait for a full requestSession round-trip.
  • Reliable mobile push: use getEffectiveEvent()?.type for decrypted event type checks; wrap all push handlers in try/catch with a fallback generic notification to prevent silent drops on iOS.
  • Parallel decryption: replace sequential requestDecryptionFromClient with Promise.any + shared timeout to fan out decryption across multiple open clients simultaneously.
  • Session TTL: increase preloaded session TTL to 24 h; add requestSessionWithTimeout fallback.
  • SW heartbeat: reset heartbeat backoff on foreground sync; kick sliding sync on foreground return.
  • Visibility signals: use clients.matchAll() visibilityState (live) instead of a stored flag to determine whether to suppress a push notification.
  • appEvents multi-subscriber: convert to a multi-subscriber pattern and cancel retry timers correctly.
  • Experiment config types: add experimentConfig, sessionSync types and useExperimentVariant to useClientConfig for feature-flag-gated SW sync phases.

Testing Checklist

  • Push notification arrives while app is backgrounded: notification shown, tap opens correct room
  • Reaction notification: correct sender context, not shown for own reactions
  • Notification tap on joined room: opens at live timeline (bottom)
  • Notification tap on event-specific link: jumps to correct message once loaded
  • Clear Cache: service workers unregistered, fresh assets loaded on next visit
  • No duplicate notifications when app is foregrounded

Changeset

  • fix: Push notification reliability (session recovery, encryption context, silent-drop prevention)
  • fix: Reaction notification context
  • feat: Unregister service workers on Clear Cache

Updated instructions for pull requests and feature flags.
Added instructions for syncing branches before creating a new branch.
Updated wording for clarity and consistency in instructions.
…resence data

- DirectDMsList: show PresenceBadge on DM avatar — actual presence for 1:1 DMs,
  green dot when any participant is online for group DMs
- AccountSwitcherTab: show PresenceBadge on own account avatar in sidebar
- Fix AvatarPresence placement: move wrapper outside SidebarAvatar (overflow:hidden
  was clipping the badge)
- useUserPresence: reset presence state when userId changes; add REST fallback for
  sliding sync (Synapse MSC4186 has no presence extension so m.presence events are
  never delivered via sync — GET /presence/:userId/status bootstraps the initial state)
- ClientNonUIFeatures: explicitly PUT /presence/:userId/status on visibility change
  so the server records online/offline state; setSyncPresence is a no-op on MSC4186
Wrap the inner callback with a no-op .catch() so fire-and-forget call
sites (e.g. loadSrc in useEffect) do not produce 'Uncaught (in promise)'
console warnings. The promise is still returned and re-thrown for callers
that await or chain, so intentional error handling is unaffected.
Add scripts/inject-client-config.js which reads HOMESERVER_LIST,
ELEMENT_CALL_URL, EXPERIMENTS and other config keys from the GH Actions
environment and merges them into config.json at build time.
CI workflows pass these through via env; the setup action prints an
injected-config summary in the job summary.
knip.json updated to include the new script as an entry point.
…ntages

useClientConfig.ts gains getExperimentVariant() which deterministically
buckets a userId into a variant using a hash of userId+experimentName, then
checks it against rolloutPercentage. Experiment defaults shape is typed so
all callers get compile-time checking of known experiment names.
ExperimentsPanel shows every experiment name, current variant, rollout
percentage, and whether the user is enrolled — readable without opening
the console. DevelopTools.tsx wires it into the developer settings tab.
Just-Insane and others added 30 commits April 21, 2026 16:28
…elay

When navigator.serviceWorker.controller is null — which happens when a
window is opened by a notification tap before the SW has claimed it, or
transiently during a SW update cycle — postMessage calls were silently
dropped.  This caused the decryption-result reply to never reach the SW,
so the 5-second timeout fired and the notification either showed
"Encrypted message" or was skipped entirely.  Subsequent notifications
after the first tended to fail because the tab that received the push
was often the freshly-opened uncontrolled window.

The same null-controller bug affected enablePushNotifications /
disablePushNotifications: if the togglePush message was dropped, the
homeserver pusher state was never updated, so no further pushes arrived.

Fix: mirror the pattern already used by SyncNotificationSettingsWithSW
and setNotificationSettings in the same file — send to both controller
(fast path, usually correct) and registration.active (belt-and-suspenders
via ready.then).  Double delivery is safe: the SW's decryptionPendingMap
deduplicates pushDecryptResult, and duplicate pusher SET/DELETE requests
to the homeserver are idempotent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Duplicate const introduced during integration merge causing build failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The integration rebuild partially included perf/timeline-rendering — it
got the 'align initial room-fill thresholds' and 'stabilize bottom pin'
commits but missed the 'feat(timeline): restore room cache by event anchor'
commit chain. That commit introduced several refs and a utility module that
the later commits depend on, causing a ReferenceError crash on room open:

  readyBlockedByPaginationRef is not defined (already restored earlier)
  saveRoomScrollStateRef / currentScrollFingerprintRef (type-only — no-op without wiring)
  prevScrollSizeRef (content-grow detection in handleVListScroll)
  isReadyRef (scroll-event guard during init)
  RoomScrollFingerprint type (return type of buildRoomScrollFingerprint)

Restore:
- src/app/utils/roomScrollCache.ts (deleted during merge)
- import { roomScrollCache, RoomScrollCache, RoomScrollFingerprint,
  RoomScrollPosition } from roomScrollCache
- const prevScrollSizeRef = useRef(0)
- const isReadyRef = useRef(false); isReadyRef.current = isReady
- const saveRoomScrollStateRef / currentScrollFingerprintRef

The full scroll-position restore wiring (roomScrollCache.save/load,
saveRoomScrollState callback, restoreRoomScrollPosition) is not yet wired
up — the saveRoomScrollStateRef.current stays undefined so those call sites
are no-ops — but all runtime references are now defined and the app no
longer crashes on room open.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- RoomTimeline: remove unused TIMELINE_ANCHOR_SELECTOR, buildRoomScrollFingerprint,
  currentScrollFingerprintRef (incomplete perf/timeline wiring stubs)
- RoomTimeline: narrow import to RoomScrollCache only (RoomScrollFingerprint unused)
- sw-session: remove void operator to satisfy no-void rule
- Auto-fixed prettier formatting in PushNotifications, ClientNonUIFeatures,
  useProcessedTimeline, BackgroundNotifications

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Discord-style message grouping where consecutive messages from the
same sender within a configurable time window collapse the sender header.

- Add messageGroupingThreshold to Settings type (default: 2 min)
- Replace hardcoded 2-min threshold in useProcessedTimeline with the setting
- Wire useSetting in RoomTimeline and pass to useProcessedTimeline
- Add MessageGrouping component in Experimental settings with 5 options:
  2 min (default), 5 min, 15 min (Discord-style), 30 min, 60 min

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts:
#	src/app/features/settings/experimental/Experimental.tsx
Previously skipWaiting() was called automatically in the install event,
causing every new deployment to immediately activate the new SW mid-session.
When users then navigated to a lazy-loaded route, the old chunk hash no
longer existed on the server → 404 → chunk-error handler → silent reload.
This presented as the app 'randomly restarting' after deployments.

Changes:
- Remove auto-skipWaiting() from SW install handler
- Add SKIP_WAITING_AND_CLAIM message handler — the update prompt already
  sends this message; now the SW acts on it with self.skipWaiting()
- Deduplicate the double navigator.serviceWorker.register() call in
  index.tsx (was registering twice with different options in dev mode)
- Move sendSessionToSW into the single registration .then() handler so
  the session is sent as soon as the registration resolves, and keep the
  .ready fallback for the case where registration already settled

The new SW will now sit in 'waiting' state until the user confirms the
update prompt ('A new version of the app is available. Refresh?').
Existing sessions are uninterrupted until then.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously all three room list virtualizers used `key={vItem.index}`
(position-based). When a room moves position (e.g. a DM jumps to the
top after a new message), React reuses the component at that index
rather than remounting it. This caused `useRoomHasUnread`'s lazy
`useState` initializer to retain stale state from the previous room at
that index — resulting in the unread dot appearing on the wrong room
until the next Timeline/Receipt event fired.

Fix: add `getItemKey` to each `useVirtualizer` config so TanStack
Virtual assigns stable room-ID-based keys, then use `vItem.key` on
`<VirtualTile>`. React now correctly unmounts/remounts when a different
room occupies a position, so all per-room state is fresh.

- Direct.tsx: getItemKey = sortedDirects[index]
- Home.tsx:   getItemKey = sortedRooms[index]
- Space.tsx:  getItemKey = hierarchy[index]?.roomId ?? index

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matrix presence is per-user on the server, not per-device. When an idle
device sends `setPresence({ presence: 'unavailable' })`, the shared
server state changes for all clients. The active device's PresenceFeature
only re-sends its state when `autoIdled`, `presenceMode`, or
`sendPresence` changes — none of which fire on the active device, so the
idle device permanently 'wins' until the user switches tabs or interacts.

Fix: add a 2-minute heartbeat that re-asserts `{ presence: 'online' }`
Within one heartbeat cycle the active device wins back the server state.
The heartbeat is idle-free (stops when autoIdled or mode changes), so it
doesn't fight against intentional DND/offline/idle changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r errors

Previously, the pusher was only re-registered with Sygnal on visibility
changes (backgrounding/foregrounding). If the browser rotated the push
subscription endpoint while the app was closed, there was a window from
app open until the first background event where pushes would silently
fail (Sygnal 410s on the stale endpoint).

Also, the togglePusher Promise was fire-and-forget — unhandled rejections
from enablePushNotifications (e.g. pushManager.subscribe() failure due to
transient network error) were silently swallowed, leaving the pusher in an
unknown state with no log trail.

Fixes:
- Call togglePusher once immediately on mount with the current visibility
  state so the endpoint is always current on app startup
- Wrap all togglePusher calls with .catch(debugLog.warn) so failures are
  surfaced in the debug log and Sentry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… zero

FaviconUpdater's badge-clearing logic only runs inside the roomToUnread
useEffect. If roomToUnread reached highlightTotal=0 while the app was
hidden (background sync from another device), document.visibilityState
was not 'visible' so clearAppBadge was skipped. When the user then
foregrounds the app roomToUnread hasn't changed so the effect doesn't
re-run and the stale badge stays set.

Fix: add a visibilitychange listener that clears the badge on foreground
whenever highlightTotalRef is 0. A ref tracks the latest value so the
listener doesn't need to close over the roomToUnread effect.

Also suppress the pre-existing no-console lint warning on the push relay
error path with an inline disable comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two fallback paths in getUnreadInfo were accidentally dropped during a
checkpoint commit (2eaefb19d):

1. PushProcessor fallback: when SDK total/highlight are both 0 but
   roomHaveUnread() confirms there ARE unread events (stale SDK counters),
   walk the live timeline with PushProcessor to compute real counts.

2. No-receipt sliding sync fallback: rooms that have never been visited
   in sliding sync have no read receipt, making SDK counts unreliable.
   If the timeline has activity from other users, return total=1 (or the
   existing SDK counts when present) so the dot badge appears.

These fallbacks are essential for non-DM rooms: DMs benefit from
shouldForceDMHighlight which forces highlight=total, but regular rooms
rely entirely on these paths when SDK counts are zero or stale.

Restores parity with the dev branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove the 24h TTL check from loadPersistedSession(). Sessions cached
  before the persistedAt field was added (age=Infinity) were being
  incorrectly rejected, causing every push notification to fall back to
  the generic 'New Message' text until the user opened the app again.
  Matrix access tokens are long-lived; if a token is truly revoked the
  downstream 401 handling in handleMinimalPushPayload provides the
  graceful fallback already.

- Wrap event.data.json() in a try/catch with a 'New Message' fallback
  notification. A malformed push payload previously caused an unhandled
  rejection in onPushNotification, which could silently drop the push
  event on iOS.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On iOS PWA, visibilityState can get stuck at 'visible' after the user
opens the app via a notification tap and then backgrounds it again.
This caused all subsequent push notifications to be silently suppressed
("first notification works, second and beyond don't").

The fix: require BOTH visibilityState === 'visible' AND client.focused.
The focused property transitions to false immediately when the user
presses the home button (before visibilitychange fires), so it is the
reliable signal that the user is actually looking at the app.

Also log focused state in the existing debug output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a user opens a room via notification tap and loadEventTimeline
encounters an error (e.g. network not yet ready after iOS app resume),
the onError handler resets to the live timeline but never called
setIsReady(true). This left the timeline permanently invisible
(opacity: 0), producing a blank white screen.

Add onJumpError callback to UseTimelineSyncOptions. RoomTimeline passes
handleJumpError which calls setIsReady(true). The error handler in
useTimelineSync now calls onJumpError() after restoring the live
timeline and scrolling to bottom, so the room becomes visible.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After iOS kills a backgrounded PWA and the app cold-starts, SyncStatus
was showing a Connecting... banner for up to 20 seconds after the loading
screen disappeared. This is because the banner fired on any transition
where previous !== SyncState.Syncing, including the initial
null→PREPARED→SYNCING cold-start sequence.

With sliding sync the second long-poll can take up to 20 s (the poll
timeout) when there are no new messages, keeping the banner visible the
entire time even though the app is fully loaded and usable.

Fix: only show Connecting... when recovering from an actual disconnect
(previous === Reconnecting or previous === Error). The loading screen
already covers the initial connection; the banner is only meaningful
when we are re-connecting after a network disruption.

State transitions:
  - null → PREPARED: no banner (loading screen handles it) ✓
  - RECONNECTING → PREPARED/SYNCING: shows banner ✓
  - ERROR → PREPARED/SYNCING: shows banner ✓
  - SYNCING → SYNCING: no banner ✓

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The SW has two paths where it suppresses OS notifications when the app
appears visible:

1. hasVisibleClient (already fixed): requires both visibilityState === 'visible'
   AND client.focused from clients.matchAll().

2. Decrypt relay result (this commit): after requestDecryptionFromClient(),
   the code checked only result?.visibilityState === 'visible' to decide
   whether the in-app UI was already showing the message.

On iOS PWA, document.visibilityState can get stuck at 'visible' after the
user backgrounds the app because visibilitychange doesn't always fire.
document.hasFocus() is updated by the browser process immediately when the
OS removes window focus (home button press), so it is the reliable signal.

Add appFocused: document.hasFocus() to both pushDecryptResult reply payloads
in ClientNonUIFeatures.tsx.  In sw.ts, require BOTH visibilityState ===
'visible' AND appFocused === true before skipping the OS notification,
mirroring the same dual-guard logic in hasVisibleClient.

This closes the second suppression path that was silently dropping background
push notifications on iOS after the user had previously opened the app from a
notification tap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous change narrowed the 'Connecting...' banner to only show
on reconnect/error. Reverting to upstream's condition:

  stateData.previous !== SyncState.Syncing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant